Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning
نویسندگان
چکیده
In terms of the travel demand prediction from household car ownership model, if imbalanced data were used to support transportation policy via a machine learning it would negatively affect algorithm training process. The on obtained study project for expressway preparation in Khon Kaen Province (2015) was an unbalanced dataset. other words, number members minority class is lower than rest answer classes. result bias classification. Consequently, this research suggested balancing datasets with cost-sensitive methods, including decision trees, k-nearest neighbors (kNN), and naive Bayes algorithms. Before creating 3-class k-folds cross-validation method applied classify define true positive rate (TPR) model’s performance validation. outcome indicated that kNN demonstrated best compared It provides TPR rural suburban area types, which are region types very different imbalance ratios, before 46.9% 46.4%. After (MCN1), values 84.4% 81.4%, respectively.
منابع مشابه
A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
and Applied Analysis 3 costs for the positive and negative classes, SVM can be extended to the cost-sensitive setting by introducing an additional parameter that penalizes the errors asymmetrically. Consider that we have a binary classification problem, which is represented by a data set {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x l , y l )}, where x i ⊂ R represents a k-dimensional data point and ...
متن کاملCost-sensitive Multiclass Classification Risk Bounds
A commonly used approach to multiclass classification is to replace the 0− 1 loss with a convex surrogate so as to make empirical risk minimization computationally tractable. Previous work has uncovered sufficient and necessary conditions for the consistency of the resulting procedures. In this paper, we strengthen these results by showing how the 0− 1 excess loss of a predictor can be upper bo...
متن کاملClassification in Imbalanced Datasets
In this thesis we study the classification task in the presence of class imbalanced data. This task arises in many applications when we are interested in the under-represented (minority) classes. Examples of such applications are related to fraud detection, medical diagnosis and monitoring, text categorization, risk management, information retrieval and filtering. Although there exist many stan...
متن کاملCost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning
Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed. Many such applications also demand varying costs for different types of mis-classification errors, but it is not clear whether or how such cost information can be incorporated into deep learning to improv...
متن کاملCost-sensitive decision tree ensembles for effective imbalanced classification
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Promet-traffic & Transportation
سال: 2021
ISSN: ['1848-4069', '0353-5320']
DOI: https://doi.org/10.7307/ptt.v33i3.3728